D2KE: From Distance to Kernel and Embedding
نویسندگان
چکیده
For many machine learning problem settings, particularly with structured inputs such as sequences or sets of objects, a distance measure between inputs can be specified more naturally than a feature representation. However, most standard machine models are designed for inputs with a vector feature representation. In this work, we consider the estimation of a function f : X → R based solely on a dissimilarity measure d : X ×X → R between inputs. In particular, we propose a general framework to derive a family of positive definite kernels from a given dissimilarity measure, which subsumes the widelyused representative-set method as a special case, and relates to the well-known distance substitution kernel in a limiting case. We show that functions in the corresponding Reproducing Kernel Hilbert Space (RKHS) are Lipschitz-continuous w.r.t. the given distance metric. We provide a tractable algorithm to estimate a function from this RKHS, and show that it enjoys better generalizability than Nearest-Neighbor estimates. Our approach draws from the literature of Random Features, but instead of deriving feature maps from an existing kernel, we construct novel kernels from a random feature map, that we specify given the distance measure. We conduct classification experiments with such disparate domains as strings, time series, and sets of vectors, where our proposed framework compares favorably to existing distance-based learning methods such as knearest-neighbors, distance-substitution kernels, pseudo-Euclidean embedding, and the representative-set method.
منابع مشابه
یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کاملA Novel Distance-based Classifier Using Convolution Kernels and Euclidean Embeddings
Distance-based classification methods such as the nearest-neighbor and k-nearest-neighbor classifiers have to rely on a metric or distance measure between points in the input space. For many applications, Euclidean distance in the input space is not a good choice and hence more complicated distance measures have to be used. In this paper, we propose a novel kernel-based method that achieves Euc...
متن کاملComposite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملSequence Classification in the Jensen-Shannon Embedding
This paper presents a novel approach to the supervised classification of structured objects such as sequences, trees and graphs, when the input instances are characterized by probability distributions. Distances between distributions are computed via the JensenShannon (JS) divergence, which offers several advantages over the L2 distance or the Kullback-Leibler divergence. The JS divergence indu...
متن کاملGraph clustering using heat kernel embedding and spectral geometry
In this paper we study the manifold embedding of graphs resulting from the Young-Householder decomposition of the heat kernel. We aim to explore how the sectional curvature associated with the embedding can be used as feature for the purposes of gauging the similarity of graphs, and hence clustering them. The curvature is computed from the difference between the geodesic (edge weight) and the E...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.04956 شماره
صفحات -
تاریخ انتشار 2018